952. Largest Component Size by Common Factor
Hard

Given a non-empty array of unique positive integers nums, consider the following graph:

  • There are nums.length nodes, labelled nums[0] to nums[nums.length - 1];
  • There is an edge between nums[i] and nums[j] if and only if nums[i] and nums[j] share a common factor greater than 1.

Return the size of the largest connected component in the graph.

 

Example 1:

Input: nums = [4,6,15,35]
Output: 4

Example 2:

Input: nums = [20,50,9,63]
Output: 2

Example 3:

Input: nums = [2,3,6,7,4,12,21,39]
Output: 8

Note:

  1. 1 <= nums.length <= 20000
  2. 1 <= nums[i] <= 100000
Accepted
26,774
Submissions
73,418
Seen this question in a real interview before?
Companies
Related Topics

Average Rating: 4.61 (28 votes)

Premium

Solution


Overview

We could consider this problem as a graph partition problem.

Each number represents a node in a graph. We are asked to partition the nodes into several groups and find the largest one.

Suppose that we have a means to determine and assign each node to a proper group, then we could easily solve the problem with a single iteration.

We could summarize the algorithm with the following Python pseudo code, where literally we count the appearance of each group.

    group_count = {}
    for num in number_list:
        group_id = group(num)
        group_count[group_id] += 1
    return max(group_count.values())

The key to the above algorithm lies in the questions on how we define a group and how we assign an element to a group.

Given the above intuition, it might remind you one of the most well-known data structures in computer science called Disjoint Set, which tracks a set of elements partitioned into a number of disjoint (non-overlapping) subsets.

Union-Find Algorithm

Indeed, the Disjoint-Set data structure would be the essential building block to solve this problem.

The Disjoint-Set data structure is also known as the Union-Find data structure. Because it mainly consists of two operations: Union() and Find() defined as follows:

  • Find(x): get the identity of the group that the element x belongs to.

  • Union(x, y): merge the two groups that the two elements belong to respectively.

Here are the sample implementation of the Union-Find data structure, following the pseudo-code presented on the wiki page.

As one can see, the code is actually surprisingly concise, yet powerful. It could be even more concise, if we did not care about the load balancing during the merge of groups in the Union() function.

We would use the implementation of this Union-Find data structure in the following approaches.


Approach 1: Union-Find via Factors

Intuition

Now that we are equipped with the Union-Find structure, which greatly facilitates the group identification and group merge operations, we can now reformulate the problem with the help of Union-Find.

As we stated before, the problem can be considered as a graph partition problem where we group nodes into a list of subsets.

Each number in the input list is represented as a node in the graph. The connection between the nodes (i.e. edge) can happen, if and only if the two nodes share a common factor greater than one.

One naive idea would be that we enumerate all pairs of nodes, in order to partition the nodes into groups, with the help of Union-Find data structure as we implemented. This could pass some test cases, though it would exceed the time limit for tougher cases, since the algorithm has a quadratic time complexity.

A more efficient idea would be that we build groups led by each of the common factors of the numbers. This can be done in a single iteration over each of the number.

For each number, we enumerate all factors that can divide the number, and then we attribute the number to each group led by the factor, i.e. Union(num, factor).

Venn diagram

As one can see in the above example, essentially we build a Venn diagram, where each subset contains a series of numbers as well as factors. Take the input number 6 as an example, it can be divided both by the factors of 2 and 3. As a result, it can be attributed to both groups that has the factors respectively. And thanks to the number 6, eventually the groups led by 2 and 3 respectively can be merged together. At the end, all the input numbers can be attributed to a single big group, thanks to all the joints among the subgroups.

Algorithm

With the above intuition, we could implement the algorithm in two general steps:

  • Step 1). Attribute each number to a series of groups that are led by each of its factors.

    • We iterate through each number, denoted as num. For each number, we iterate from 2 to sqrt(num) to find out all the factors.

    • For each factor of num, we merge the groups of that possess the element num and factor respectively, i.e. Union(num, factor).

    • In addition, we perform the same union operation on the complement factor as well, i.e. Union(num, num / factor).

  • Step 2). Iterate through each number a second time, to find out the final group that the number belongs to.

    • With the mapping between the number and its group ID, i.e. {num -> group_id}, it is intuitive to find out the group that has the most elements.

Note: One might encounter TLE (Time Limit Exceeded) exception with the above solution (especially in Python), when the online judge is under load. But under the normal circumstance, the solution would be accepted, which is even faster than 30% of submissions.

Complexity Analysis

Since we applied the Union-Find data structure in our algorithm, we would like to start with a statement on the time complexity of the data structure, as follows:

Statement: If MM operations, either Union or Find, are applied to NN elements, the total run time is O(MlogN)\mathcal{O}(M \cdot \log^{*}{N}), where log\log^{*} is the iterated logarithm.

One can refer to the proof of Union-Find complexity for more details.

In our case, the number of elements in the Union-Find data structure is equal to the maximum number of the input list, i.e. max(A).

Let NN be the number of elements in the input list, and MM be the maximum value of the input list.

  • Time Complexity: O(NMlogM)\mathcal{O}(N \cdot \sqrt{M} \cdot \log^{*}{M})

    • The number of factors for a given number is bounded by O(M)\mathcal{O}(\sqrt{M}). Assuming that any number that is less than M\sqrt{M} can be divided by MM, we would then have 2M2 \cdot \sqrt{M} pairs of factors.

    • In the first step, we iterate through each number (i.e. NN iterations), and for each number, we iterate through all its factors (i.e. up to 2M2 \cdot \sqrt{M} iterations). As a result, the time complexity of this step would be O(NMlogM)\mathcal{O}(N \cdot \sqrt{M} \cdot \log^{*}{M}).

    • In the second step, we iterate through each number again. But this time, for each iteration we perform only once the Union-Find operation. Hence, the time complexity for this step would be O(NlogM)\mathcal{O}(N \cdot \log^{*}{M}).

    • To sum up, the overall complexity of the algorithm would be O(NMlogM)+O(NlogM)=O(NMlogM)\mathcal{O}(N \cdot \sqrt{M} \cdot \log^{*}{M}) + \mathcal{O}(N \cdot \log^{*}{M}) = \mathcal{O}(N \cdot \sqrt{M} \cdot \log^{*}{M}) .

  • Space Complexity: O(M+N)\mathcal{O}(M + N)

    • The space complexity of the Union-Find data structure is O(M)\mathcal{O}(M).

    • In the main algorithm, we use a hash table to keep track of the account for each group. In the worst case, each number forms an individual group. Therefore, the space complexity of this hash table is O(N)\mathcal{O}(N).

    • To sum up, the overall space complexity of the algorithm is O(M)+O(N)=O(M+N)\mathcal{O}(M) + \mathcal{O}(N) = \mathcal{O}(M + N).


Approach 2: Union-Find on Prime Factors

Intuition

One might notice that in the above algorithm, we would enumerate through a series of non-essential factors for a number.

For instance, for the number 12, we have a number of factors as [2, 3, 4, 6]. In this case, the factors of [4, 6] are not essential, since they have been covered by the prime factors of [2, 3]. If there is another number (say 30) that has a common factor with the number 12, then this common factor is either one of the prime factors of [2, 3] or it can be further divided by these prime factors.

The intuition is that the prime factors of a number can represent all of its factors, i.e. the integer can be characterized by a series of prime factors.

Indeed, "By the fundamental theorem of arithmetic, every positive integer has a unique prime factorization", as we quote from wikipedia.

Each positive integer (except 1) can be decomposed into a series of prime numbers, e.g. 12=223 12 = 2 * 2 * 3.

With the above theories, rather than enumerating all the factors of a number, we just need to enumerate the prime factors of a number, in our Union-Find data structure.

Sieve Method

Before proceeding to the main algorithm for this problem, let us briefly list the algorithm to decompose a number into a series of prime factors, which itself is not an easy problem.

Here we apply the sieve of Eratosthenes (let's call it sieve method for short), an ancient algorithm to calculate all prime factors up to any given limit.

Algorithm

Now that we know how to decompose a number into a series of prime factors, we can simply replace common factors in the previous approach with the prime factors. This could work.

However, there is another arguably more efficient method, which is that rather than Union-Find on all numbers together with its prime factors, we do the Union-Find solely on the prime factors, excluding the numbers.

We could therefore have much smaller set of elements for the Union-Find operations. We illustrate how it could work on the same example before in the following graph.

Union-Find on prime factors

Similar with the previous approach, we could implement the algorithm in two general steps:

  • Step 1). Decompose each number into its prime factors and apply Union() operations on the series of prime factors.

    • We iterate through each number, denoted as num. For each number, we decompose it into prime factors.

    • We join all groups that possess these prime factors, by applying Union() operation on each adjacent pair of prime factors.

    • In addition, we use a hash table to keep the mapping between each number and its any of prime factors. Later, we would use this table to find out which group that each number belongs to.

  • Step 2). Iterate through each number a second time, to find out the final group that the number belongs to.

    • Since we build Union-Find sets solely on the prime factors, we could find out which group that each prime factor belongs to, i.e. prime_factor -> group_id.

    • Thanks to the mapping between the number and its prime factor, i.e. {num -> prime_factor}, we could now find out which group that each number belongs with the above Union-Find sets, i.e. num -> prime_factor -> group_id.

Complexity Analysis

Let NN be the number of elements in the input list, and MM be the maximum value of the input list.

  • Time Complexity: O(N(log2MlogM+M))\mathcal{O}\big(N \cdot (\log_{2}{M} \cdot \log^{*}{M} + \sqrt{M}) \big)

    • First of all, the time complexity of the sieve method to calculate the prime factors of is O(M)\mathcal{O}(\sqrt{M}).

    • It is hard to estimate the number of prime factors for a given number. Since the smallest prime number is 2, a coarse upper bound for the number of the prime factors is log2M\log_{2}{M}, e.g. 8=222 8 = 2 * 2 * 2.

    • In the first step, we iterate through each number (i.e. NN iterations), and for each number, we iterate through all its factors (i.e. up to log2M\log_{2}{M} iterations). As a result, together with the calculation of prime factors, the time complexity of this step would be O(Nlog2MlogM)+O(NM)=O(N(log2MlogM+M))\mathcal{O}(N \cdot \log_{2}{M} \cdot \log^{*}{M}) + \mathcal{O}(N \cdot \sqrt{M}) = \mathcal{O}\big(N \cdot (\log_{2}{M} \cdot \log^{*}{M} + \sqrt{M}) \big).

    • In the second step, we iterate through each number again. But this time, for each iteration we perform only once the Union-Find operation, i.e. O(NlogM)\mathcal{O}(N \cdot \log^{*}{M}).

    • To sum up, the overall complexity of the algorithm would be O(N(log2MlogM+M))\mathcal{O}\big(N \cdot (\log_{2}{M} \cdot \log^{*}{M} + \sqrt{M}) \big).

    • As one might notice that, the asymptotic complexity of this approach seems to be inferior than the previous approach, due to the calculation of prime factors. However, in reality, the saving we gain on the Union-Find operations could outweigh the cost of prime factor calculation.

  • Space Complexity: O(M+N)\mathcal{O}(M + N)

    • The space complexity of the Union-Find data structure is O(M)\mathcal{O}(M).

    • In the main algorithm, we use a hash table to keep track of the count for each group. In the worst case, each number forms an individual group. Therefore, the space complexity of this hash table is O(N)\mathcal{O}(N).

    • In addition, we keep a map between each number and one of its prime factors. Hence the space complexity of this map is O(N)\mathcal{O}(N).

    • To sum up, the overall space complexity of the algorithm is O(M)+O(N)+O(N)=O(M+N)\mathcal{O}(M) + \mathcal{O}(N) + \mathcal{O}(N) = \mathcal{O}(M + N).



Report Article Issue

Comments: 7
box_of_donuts's avatar
Read More

"The number of factors for a given number is bounded by \log_{2}M" Am I missing something or is this blatantly false? Consider the number M = 2*3*5*7*11*13*17*19. This number has 2^8 - 1 = 255 factors excluding 1. But log_2(M) ~ 23.21.

So I think it would be better to just naively upper bound the number of factors by sqrt(M) and state that solution 1 runs in O(N sqrt(M)) (and a solution with this bound is what one would realistically give in an interview to be considered "good").

4
Show 3 replies
Reply
Share
Report
virtyaluk's avatar
Read More

This is a great article by far. I find the first solution is much cumbersome and intuitive rather than the second. Though the second solution doesn't look like one could come up with in a reasonable time during the interview. Thanks.

3
Reply
Share
Report
zerdoff's avatar
Read More

one of the most well-known data structures in computer science called Disjoint Set

I doubt that it's well-known. I just learned it from this article.
Thank you for educating me!

3
Reply
Share
Report
sraj7's avatar
Read More

Average number of factors for a given number M: log_2(M)
In worst Case: sqrt(M)
https://terrytao.wordpress.com/2008/09/23/the-divisor-bound/

However, the runtime of each iteration of the first loop in Approach 1 is always dominated by: factor < (int)(Math.sqrt(num))+1 , i.e. sqrt(M).

0
Reply
Share
Report
133c7's avatar
Read More

Nice. My first approach was using DFS but it was not clear how to eliminate the bottleneck of considering elements pairwise to determine a factor. Therefore the solution was O(N^2) and was timing out. Using Union-Find and factors themselves as nodes is a clever way to avoid that bottleneck. Nice article.

Below is a DFS solution:

class Solution {
    public int largestComponentSize(int[] A) {
        int n = A.length;
        if (n <= 1) {
            return n;
        }
        
        List<List<Integer>> graph = new ArrayList();
        
        for (int i = 0; i < n; i++) {
            graph.add(new ArrayList());
        }
        
        // construct graph
        for (int i = 0; i < n-1; i++) {
            for (int j = i+1; j < n; j++) {
                if (hasFactorGtOne(A[i], A[j])) {
                    graph.get(i).add(j);
                    graph.get(j).add(i);
                }
            }
        }
        
        boolean[] visited = new boolean[n];
        int[] ccSize = new int[n];
        
        for (int i = 0; i < n; i++) {
            if (!visited[i]) {
                dfs(graph, i, i , visited, ccSize);
            }
        }
        
        int ans = 0;
        for (int i = 0; i < n; i++) {
            ans = Math.max(ans, ccSize[i]);
        }
        
        return ans;
    }
    
    private void dfs(List<List<Integer>> graph, int src, int ccIdx, boolean[] visited, int[] ccSize) {
        visited[src] = true;
        ccSize[ccIdx]++;
        
        for (int w : graph.get(src)) {
            if (!visited[w]) {
                dfs(graph, w, ccIdx, visited, ccSize);
            }
        }
    }
    
    private boolean hasFactorGtOne(int a, int b) {
        for (int i = 2; i <= Math.min(a, b); i++) {
            if (a % i == 0 && b % i == 0) {
                return true;
            }
        }
        
        return false;
    }
}

0
Reply
Share
Report
zcoder_93's avatar
Read More

For Union and Find, we dont have to implement the whole class yeah? Something like below still works fine?

class Solution {
    public int largestComponentSize(int[] A) {
        int aL = A.length;
        Arrays.sort(A);
        int max = A[aL - 1];
        int[] parents = new int[max + 1];
        for (int i = 0; i <= max; i++)
            parents[i] = i;
        
        for (int i = 0; i < aL; i++) {
            int node = A[i];
            int factor = 2;
            while (factor < (int) Math.sqrt(node) + 1) {    
                if (node % factor == 0) {
                    union(node, factor, parents);
                    union(node, node/factor, parents);
                }
                factor++;
            }
        }
        
        int maxCount = 0;
        Map<Integer, Integer> countMap = new HashMap<>();
        for (int n: A) {
            int g = find(n, parents);
            countMap.put(g, countMap.getOrDefault(g, 0) + 1);
            maxCount = Math.max(maxCount, countMap.get(g));
        }
        return maxCount;
    }
    
    private void union(int n1, int n2, int[] parents) {
        int p1 = find(n1, parents);
        int p2 = find(n2, parents);
        if (p1 != p2) {
            parents[p2] = p1; 
        } 
    }
    
    private int find(int n, int[] parents) {
        if (parents[n] != n) 
            parents[n] = find(parents[n], parents);
        return parents[n];
    }
}

0
Reply
Share
Report
alvin-777's avatar
Read More

Approach 2, at the beginning, 8 is not a factor of 12...

0
Show 1 reply
Reply
Share
Report

You don't have any submissions yet.

/23
Contribute
|||
Saved
Trie
This list is empty.